Search results for "Distributed memory"

showing 10 items of 13 documents

Parallel Schwarz methods for convection-dominated semilinear diffusion problems

2002

AbstractParallel two-level Schwarz methods are proposed for the numerical solution of convection-diffusion problems, with the emphasis on convection-dominated problems. Two variants of the methodology are investigated. They differ from each other by the type of boundary conditions (Dirichlet- or Neumann-type) posed on a part of the second-level subdomain interfaces. Convergence properties of the two-level Schwarz methods are experimentally compared with those of a variant of the standard multi-domain Schwarz alternating method. Numerical experiments performed on a distributed memory multiprocessor computer illustrate parallel efficiency of the methods.

Parallel computingApplied MathematicsNumerical analysisMathematical analysisParallel algorithmDomain decomposition methodsSingularly perturbed semilinear convection–diffusion problemMulti-level Schwarz methodsComputational MathematicsAdditive Schwarz methodDistributed memoryBoundary value problemSchwarz alternating methodConvection–diffusion equationMathematicsJournal of Computational and Applied Mathematics
researchProduct

The differences between distributed shared memory caching and proxy caching

2000

The authors discuss the similarities in caching between the extensively studied distributed shared memory systems and the emerging proxy systems. They believe that several of the techniques used in distributed shared memory systems can be adapted and applied to proxy systems.

Distributed shared memoryHardware_MEMORYSTRUCTURESShared memoryComputer scienceShared disk architectureDistributed computingGeneral EngineeringInterleaved memoryFalse sharingUniform memory accessDistributed memoryData diffusion machineIEEE Concurrency
researchProduct

A Low Cost Solution for 2D Memory Access

2006

Many of the new coding tools in the H.264/AVC video coding standard are based on 2D processing resulting in row-wise and column-wise memory accesses starting from arbitrary memory locations. This paper proposes a low cost solution for efficient realization of these 2D block memory accesses on sub-word parallel processors. It is based on the use of simple register-based data permutation networks placed between the processor and memory. The data rearrangement capabilities of the networks can further be extended with more complex control schemes. With the proposed control schemes, the networks enable row and column accesses from arbitrary memory locations for blocks of data while maintaining f…

Flat memory modelShared memoryComputer scienceInterleaved memoryRegistered memoryUniform memory accessSemiconductor memoryDistributed memoryParallel computingMemory map2006 49th IEEE International Midwest Symposium on Circuits and Systems
researchProduct

Analyzing the Energy Efficiency of the Memory Subsystem in Multicore Processors

2014

In this paper we analyze the energy overhead incurred when operating with data stored in different levels of the memory subsystem (cache levels and DDR chips) of current multicore architectures. Our approach builds upon servet, a portable framework for the memory characterization of multicore processors, extending this suite with a power-related test that, when applied to a platform equipped with a power measurement mechanism, provides information on the efficiency of memory energy usage. As additional contributions, i) we provide a complete experimental study of the impact that the CPU performance states (also known as P-states) exert on the memory energy efficiency of a collection of rece…

Memory coherenceMemory managementFlat memory modelShared memoryComputer scienceInterleaved memoryUniform memory accessDistributed memorySemiconductor memoryParallel computing2014 IEEE International Symposium on Parallel and Distributed Processing with Applications
researchProduct

PenRed: An extensible and parallel Monte-Carlo framework for radiation transport based on PENELOPE

2021

Monte Carlo methods provide detailed and accurate results for radiation transport simulations. Unfortunately, the high computational cost of these methods limits its usage in real-time applications. Moreover, existing computer codes do not provide a methodology for adapting these kind of simulations to specific problems without advanced knowledge of the corresponding code system, and this restricts their applicability. To help solve these current limitations, we present PenRed, a general-purpose, stand-alone, extensible and modular framework code based on PENELOPE for parallel Monte Carlo simulations of electron-photon transport through matter. It has been implemented in C++ programming lan…

Parallel computingPhysics - Instrumentation and DetectorsAtomic Physics (physics.atom-ph)FortranRadiation transportFOS: Physical sciencesGeneral Physics and AstronomyParallel computingcomputer.software_genre01 natural sciencesPhysics - Atomic Physics010305 fluids & plasmasElectron-photon showers0103 physical sciencesCIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL010306 general physicsMonte Carlo simulationcomputer.programming_languageMPICHbusiness.industryInstrumentation and Detectors (physics.ins-det)Construct (python library)Computational Physics (physics.comp-ph)Modular designPhysics - Medical PhysicsShared memoryHardware and ArchitectureProgramming paradigmDistributed memoryMPIMedical Physics (physics.med-ph)CompilerMedical physicsbusinessPhysics - Computational Physicscomputer
researchProduct

Comparison of parallel implementation of some multi-level Schwarz methods for singularly perturbed parabolic problems

1999

Abstract Parallel multi-level algorithms combining a time discretization and an overlapping domain decomposition technique are applied to the numerical solution of singularly perturbed parabolic problems. Two methods based on the Schwarz alternating procedure are considered: a two-level method with auxiliary “correcting” subproblems as well as a three-level method with auxiliary “predicting” and “correcting” subproblems. Moreover, modifications of the methods using time extrapolation on subdomain interfaces are investigated. The emphasis is given to the description of the algorithms as well as their computer realization on a distributed memory multiprocessor computer. Numerical experiments …

Predictor–corrector methodParallel computingSingular perturbationPartial differential equationDiscretizationApplied MathematicsMathematical analysisExtrapolationMathematicsofComputing_NUMERICALANALYSISDomain decomposition methodsComputational MathematicsMulti-level Schwarz methodApplied mathematicsSingularly perturbed parabolic problemDistributed memorySchwarz alternating methodMathematicsJournal of Computational and Applied Mathematics
researchProduct

Moving Learning Machine Towards Fast Real-Time Applications: A High-Speed FPGA-based Implementation of the OS-ELM Training Algorithm

2018

Currently, there are some emerging online learning applications handling data streams in real-time. The On-line Sequential Extreme Learning Machine (OS-ELM) has been successfully used in real-time condition prediction applications because of its good generalization performance at an extreme learning speed, but the number of trainings by a second (training frequency) achieved in these continuous learning applications has to be further reduced. This paper proposes a performance-optimized implementation of the OS-ELM training algorithm when it is applied to real-time applications. In this case, the natural way of feeding the training of the neural network is one-by-one, i.e., training the neur…

Computer Networks and CommunicationsComputer scienceReal-time computingParameterized complexitylcsh:TK7800-836002 engineering and technologyextreme learning machine0202 electrical engineering electronic engineering information engineeringSensitivity (control systems)Electrical and Electronic EngineeringEnginyeria d'ordinadorsField-programmable gate arrayFPGAExtreme learning machineEnginyeria elèctricaArtificial neural networkData stream mininglcsh:Electronics020206 networking & telecommunicationsOS-ELMreal-time learningHardware and ArchitectureControl and Systems Engineeringon-chip trainingSignal Processingon-line learning020201 artificial intelligence & image processingDistributed memoryonline sequential ELMhardware implementationAlgorithm
researchProduct

MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems

2016

This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of recordJorge González-Domínguez, Yongchao Liu, Juan Touriño, Bertil Schmidt; MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems, Bioinformatics, Volume 32, Issue 24, 15 December 2016, Pages 3826–3828, https://doi.org/10.1093/bioinformatics/btw558is available online at: https://doi.org/10.1093/bioinformatics/btw558 [Abstracts] MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-sca…

0301 basic medicineStatistics and ProbabilitySource codeComputer sciencemedia_common.quotation_subject02 engineering and technologyParallel computingcomputer.software_genreBiochemistryExecution time03 medical and health sciences0202 electrical engineering electronic engineering information engineeringCluster (physics)Point (geometry)Amino Acid SequenceMolecular Biologymedia_commonSequenceMultiple sequence alignmentProtein multiple sequenceComputational BiologyProteinsMarkov ChainsComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsDistributed memory systemsMSAProbs020201 artificial intelligence & image processingMPIData miningSequence AlignmentcomputerAlgorithmsSoftware
researchProduct

Unified Parallel C++

2018

Abstract Although MPI is commonly used for parallel programming on distributed-memory systems, Partitioned Global Address Space (PGAS) approaches are gaining attention for programming modern multi-core CPU clusters. They feature a hybrid memory abstraction: distributed memory is viewed as a shared memory that is partitioned among nodes in order to simplify programming. In this chapter you will learn about Unified Parallel C++ (UPC++), a library-based extension of C++ that gathers the advantages of both PGAS and Object Oriented paradigms. The examples included in this chapter will help you to understand the main features of PGAS languages and how they can simplify the task of programming par…

Object-oriented programmingSource codeComputer sciencemedia_common.quotation_subjectParallel computingSoftware_PROGRAMMINGTECHNIQUESShared memoryAsynchronous communicationUnified Parallel CDistributed memoryPartitioned global address spacecomputercomputer.programming_languageAbstraction (linguistics)media_common
researchProduct

Distributed Computing on Distributed Memory

2018

Distributed computation is formalized in several description languages for computation, as e.g. Unified Modeling Language (UML), Specification and Description Language (SDL), and Concurrent Abstract State Machines (CASM). All these languages focus on the distribution of computation, which is somewhat the same as concurrent computation. In addition, there is also the aspect of distribution of state, which is often neglected. Distribution of state is most commonly represented by communication between active agents. This paper argues that it is desirable to abstract from the communication and to consider abstract distributed state. This includes semantic handling of conflict resolution, e.g. i…

Computer scienceSemantics (computer science)ConcurrencyDistributed computing020207 software engineering0102 computer and information sciences02 engineering and technology01 natural sciencesSpecification and Description LanguageUnified Modeling Language010201 computation theory & mathematics0202 electrical engineering electronic engineering information engineeringAbstract state machinesDistributed memoryMemory modelState (computer science)computercomputer.programming_language
researchProduct